Home Projects Agentic Browser Tool System Browser Automation Tools

Browser Automation Tools

Referenced Files

browser_use_service.py browser_use.py tool.py agent.py agent.py browser_use.py agent_sanitizer.py executeActions.ts executeAgent.ts background.ts content.ts AgentExecutor.tsx agent-map.ts react_agent.py react_tools.py

Introduction#

This document explains the browser automation tools and the AgentService implementation that powers intelligent web interaction. It covers the browser action generation system, DOM structure analysis, and intelligent script creation. It documents the BrowserActionInput schema, constraint handling, and target URL navigation capabilities. It also details the integration with the agent system, action planning algorithms, and execution patterns. Practical examples, common use cases, security considerations, performance optimization, and debugging approaches are included to help both technical and non-technical users understand and operate the system effectively.

Project Structure#

The browser automation system spans backend services, FastAPI routes, LangChain prompts, a browser extension, and React-based UI. The key layers are:

Backend API and service layer for generating action plans
Prompt templates and sanitization utilities
Extension background script, content script, and side panel executor
Agent orchestration and tool integration

graph TB subgraph "Extension UI" UI["Side Panel Executor
AgentExecutor.tsx"] AgentMap["Agent Map
agent-map.ts"] end subgraph "Extension Runtime" BG["Background Script
background.ts"] CS["Content Script
content.ts"] ExecUtil["Action Execution Utils
executeActions.ts"] ExecAgent["Agent Runner
executeAgent.ts"] end subgraph "Backend" API["FastAPI Router
routers/browser_use.py"] SVC["AgentService
services/browser_use_service.py"] PROMPT["Prompt Template
prompts/browser_use.py"] SAN["Sanitizer
utils/agent_sanitizer.py"] MODELS["Pydantic Models
models/requests/agent.py
models/response/agent.py"] end subgraph "Agent System" RA["React Agent Graph
agents/react_agent.py"] RT["Agent Tools
agents/react_tools.py"] end UI --> BG AgentMap --> UI BG --> API API --> SVC SVC --> PROMPT SVC --> SAN SVC --> MODELS BG --> ExecAgent BG --> ExecUtil BG --> CS RA --> RT RT --> SVC

Diagram sources

Section sources

Core Components#

AgentService: Generates a JSON action plan from a user goal, optional target URL, DOM structure, and constraints. It formats the DOM info, constructs a user prompt, invokes the LLM, and validates/sanitizes the resulting JSON.
FastAPI Router: Exposes a POST endpoint to generate scripts, validating inputs and returning structured responses.
BrowserActionInput Schema: Defines the input contract for the browser action tool, including goal, target_url, dom_structure, and constraints.
Prompt Template: Provides a system prompt and examples for DOM manipulation and tab/window control actions, with strict JSON output requirements and critical rules.
Sanitizer: Validates the generated JSON plan, ensuring required fields, valid action types, and safe custom script patterns.
Extension Integration: The side panel executor triggers generation, captures DOM context, and executes actions via background and content scripts.

Section sources

Architecture Overview#

The system orchestrates a user intent into executable browser actions:

The side panel collects the goal and optional URL, optionally resolves a tab context, and captures the DOM.
The backend receives the request, builds a prompt enriched with DOM info, and asks the LLM to produce a JSON action plan.
The sanitizer validates the plan and returns it to the UI.
The UI executes the plan via the extension’s background script, which performs tab/window control or injects DOM actions into the page.

sequenceDiagram participant User as "User" participant UI as "Side Panel
AgentExecutor.tsx" participant BG as "Background Script
background.ts" participant API as "FastAPI Router
routers/browser_use.py" participant SVC as "AgentService
services/browser_use_service.py" participant PROMPT as "Prompt Template
prompts/browser_use.py" participant SAN as "Sanitizer
utils/agent_sanitizer.py" User->>UI : Enter goal and optional URL UI->>BG : Capture DOM and prepare payload UI->>API : POST /api/agent/generate-script API->>SVC : generate_script(goal, target_url, dom_structure, constraints) SVC->>PROMPT : Build user prompt with DOM info SVC->>SVC : Call LLM and parse response SVC->>SAN : Validate JSON and actions SAN-->>SVC : Problems or sanitized plan SVC-->>API : {ok, action_plan} or error API-->>UI : Response UI->>BG : Execute actions (tab/window or injected DOM) BG-->>UI : Execution results

Diagram sources

Detailed Component Analysis#

AgentService: Intelligent Script Generation#

AgentService orchestrates the generation of a JSON action plan:

Formats DOM info for the prompt, including URL, title, and interactive elements.
Constructs a user prompt with explicit rules for DOM vs tab actions, search URL construction, and constraints.
Invokes the LLM pipeline and parses the response.
Sanitizes and validates the JSON plan, reporting problems and returning raw response context on failure.

flowchart TD Start(["Entry: generate_script"]) --> FormatDOM["Format DOM Info"] FormatDOM --> BuildPrompt["Build User Prompt"] BuildPrompt --> InvokeLLM["Invoke LLM Chain"] InvokeLLM --> ParseResp["Parse Response"] ParseResp --> Sanitize["Sanitize and Validate JSON"] Sanitize --> Valid{"Valid Plan?"} Valid --> |Yes| ReturnOK["Return {ok: true, action_plan}"] Valid --> |No| ReturnErr["Return {ok: false, error, problems, raw_response}"] ReturnOK --> End(["Exit"]) ReturnErr --> End

Diagram sources

Section sources

FastAPI Router: Endpoint Contract and Validation#

The router enforces:

Required goal field.
Delegates to AgentService and handles validation errors vs general errors.
Returns a standardized response model including ok flag, action_plan, error, problems, and raw_response.

sequenceDiagram participant Client as "Client" participant Router as "Router
routers/browser_use.py" participant Service as "AgentService" participant ModelR as "GenerateScriptRequest" participant ModelS as "GenerateScriptResponse" Client->>Router : POST /generate-script Router->>ModelR : Validate request alt Missing goal Router-->>Client : 400 Bad Request else Valid Router->>Service : generate_script(...) Service-->>Router : {ok, action_plan} or {ok : false, error/problems} Router->>ModelS : Serialize response Router-->>Client : 200 OK with response end

Diagram sources

Section sources

BrowserActionInput Schema and Tool Integration#

The schema defines:

goal: The user’s instruction.
target_url: Optional URL to navigate to.
dom_structure: Optional DOM snapshot for context.
constraints: Optional constraints for the action.

The tool wraps AgentService and exposes a structured tool for agent workflows.

classDiagram class BrowserActionInput { +string goal +string target_url +dict dom_structure +dict constraints } class AgentService { +generate_script(goal, target_url, dom_structure, constraints) dict } BrowserActionInput --> AgentService : "used by tool"

Diagram sources

Section sources

DOM Structure Analysis and Interactive Elements#

The extension captures DOM context for the current tab:

Executes a content script function to collect interactive elements (links, buttons, inputs, selects, textareas, ARIA roles).
Filters visible elements and extracts attributes (tag, id, class, type, placeholder, name, aria-label, innerText/textContent).
Limits payload size and returns URL, title, and raw HTML alongside interactive elements.

flowchart TD Start(["Capture DOM"]) --> QueryTabs["Query Active Tab"] QueryTabs --> ExecScript["Execute Content Script to Collect DOM"] ExecScript --> FilterVisible["Filter Visible Elements"] FilterVisible --> ExtractAttrs["Extract Attributes and Text"] ExtractAttrs --> LimitPayload["Limit Interactive List"] LimitPayload --> ReturnDOM["Return {url, title, interactive, raw_html}"] ReturnDOM --> End(["Done"])

Diagram sources

executeAgent.ts

Section sources

executeAgent.ts

Action Planning and Execution Patterns#

The extension supports two execution modes:

Side panel executor: Parses slash commands, resolves agent endpoints, captures DOM context, and executes the returned action plan.
Background runner: Receives action plans and executes them with robust tab/window control and DOM injection.

sequenceDiagram participant UI as "Side Panel" participant BG as "Background Script" participant CS as "Content Script" participant Tab as "Browser Tabs" UI->>BG : Execute actions loop For each action alt Tab/Window Control BG->>Tab : Create/Close/Switch/Navigate/Reload/Duplicate Tab-->>BG : Status else DOM Manipulation BG->>CS : Inject script to CLICK/TYPE/SCROLL/WAIT CS-->>BG : Result end end BG-->>UI : Aggregate results

Diagram sources

Section sources

Agent Orchestration and Tool Integration#

The React agent integrates browser actions with other tools:

Builds a LangGraph workflow with tools including the browser action tool.
Routes tool calls to the appropriate tool and returns results to the agent.
Supports dynamic tool composition based on context (e.g., Google tokens, JIIT session).

graph LR RA["React Agent
react_agent.py"] --> RT["Agent Tools
react_tools.py"] RT --> BA["Browser Action Tool
tools/browser_use/tool.py"] BA --> SVC["AgentService
services/browser_use_service.py"]

Diagram sources

Section sources

Dependency Analysis#

Service-to-Prompt: AgentService depends on the prompt template for constructing the user prompt.
Service-to-Sanitizer: AgentService relies on the sanitizer to validate JSON and detect unsafe patterns.
Router-to-Service: FastAPI router delegates to AgentService and returns standardized responses.
Extension-to-Service: The side panel executor calls the backend; the background script executes actions locally when needed.
Agent System: React agent composes tools, including the browser action tool, enabling multi-modal automation.

graph TB SVC["AgentService"] --> PROMPT["Prompt Template"] SVC --> SAN["Sanitizer"] API["Router"] --> SVC UI["Side Panel"] --> API BG["Background Script"] --> BGExec["Action Execution"] BG --> CS["Content Script"] RA["React Agent"] --> RT["Agent Tools"] RT --> BA["Browser Action Tool"] BA --> SVC

Diagram sources

Section sources

Performance Considerations#

DOM capture limits: Interactive element lists are truncated to avoid large payloads and excessive token usage.
Action throttling: A small delay is introduced between actions to prevent rapid-fire operations that may overwhelm the page.
LLM token budget: DOM info is limited to a subset of interactive elements and concise text to fit within context windows.
Tab operations: Navigation and reload operations wait for completion to avoid race conditions.
Sanitization overhead: JSON validation and safety checks occur synchronously; batching or caching could reduce repeated work.

[No sources needed since this section provides general guidance]

Troubleshooting Guide#

Common issues and resolutions:

Missing goal: The router returns a 400 error when goal is empty.
Validation failures: If the sanitizer detects missing fields or invalid action types, the service returns problems and a raw response preview.
DOM action on chrome:// pages: The prompt explicitly forbids DOM actions on chrome-specific URLs; use tab control actions instead.
Search URL construction: For search intents, use OPEN_TAB with a fully constructed search URL rather than opening blank tabs and typing.
Element not found: DOM actions throw errors when selectors do not match; verify selectors and ensure the page has loaded.
Tab switching: SWITCH_TAB requires either tabId or direction; ensure one is provided.
Execution timeouts: Tab operations include timeout fallbacks; adjust expectations for slow-loading pages.

Section sources

Conclusion#

The browser automation system combines a robust backend service with a powerful extension runtime to deliver intelligent, safe, and efficient web interactions. By structuring goals into precise JSON action plans, validating them rigorously, and executing them through tab/window control or DOM injection, the system supports a wide range of scenarios—from targeted form filling to complex multi-tab workflows. The React agent further enhances capability by integrating browser actions with other tools, enabling multimodal automation.

[No sources needed since this section summarizes without analyzing specific files]

Appendices#

Browser Action Types and Constraints#

DOM Manipulation Actions: CLICK, TYPE, SCROLL, WAIT, SELECT, EXECUTE_SCRIPT
Tab/Window Control Actions: OPEN_TAB, CLOSE_TAB, SWITCH_TAB, NAVIGATE, RELOAD_TAB, DUPLICATE_TAB
Constraints: Provide target_url for navigation, ensure selectors are specific and reliable, and avoid DOM actions on chrome:// pages.

Section sources

Example Scenarios#

Open a new tab and search: Use OPEN_TAB with a constructed search URL; avoid typing into chrome:// pages.
Fill a login form: Capture DOM, select precise selectors for input fields, and use TYPE with WAIT for page readiness.
Switch tabs and click a button: Use SWITCH_TAB with tabId or direction, then CLICK on the target element.
Scroll and extract links: Use SCROLL to reveal content, then use EXECUTE_SCRIPT to gather link data safely.

Section sources

Security Considerations#

Custom script validation: The sanitizer rejects potentially dangerous patterns in EXECUTE_SCRIPT actions.
DOM injection safety: Content scripts and background scripts restrict DOM manipulation to trusted selectors and validated contexts.
CORS and permissions: Ensure the extension has scripting permissions for target origins and respects site policies.

Section sources

Previous Tool System

Next Content Processing And Extraction Tools

Agentic Browser

AI Agent System

API Server

Browser Automation

Browser Extension

Data Models And Schemas

Prompts And Prompt Engineering

Service Integrations

System Architecture

Tool System

Browser Automation Tools

Table of Contents#

Introduction#

Project Structure#

Core Components#

Architecture Overview#

Detailed Component Analysis#

AgentService: Intelligent Script Generation#

FastAPI Router: Endpoint Contract and Validation#

BrowserActionInput Schema and Tool Integration#

DOM Structure Analysis and Interactive Elements#

Action Planning and Execution Patterns#

Agent Orchestration and Tool Integration#

Dependency Analysis#

Performance Considerations#

Troubleshooting Guide#

Conclusion#

Appendices#

Browser Action Types and Constraints#

Example Scenarios#

Security Considerations#